Introduction

The Case of Jack the Ripper:

Image source: Crime Traveler (July 2018): Jack The Ripper Was Three Killers: New Theory in Sherlock Holmes and the Autumn of Terror. https://www.crimetraveller.org/2017/06/jack-the-ripper-was-three-killers-sherlock-holmes-autumn-of-terror/

Image source: Crime Traveler (July 2018): Jack The Ripper Was Three Killers: New Theory in Sherlock Holmes and the Autumn of Terror. https://www.crimetraveller.org/2017/06/jack-the-ripper-was-three-killers-sherlock-holmes-autumn-of-terror/


In 1888, an unknown killer caused fear and mayhem in the streets of London after five women were murdered. The killer was known only to the public as Jack the Ripper. Jack the Ripper is one of the most famous unsolved mysteries of all time. This case has perplexed detectives and scholars alike for the past 130 years. The authorities of the time had unsophisticated techniques for collecting evidence and were never able to narrow in on one suspect. Very little still exists that might be able to finally catch this age-old killer. Jack the Ripper often taunted the investigators of his (or possibly her) crimes through letters, and these letters still exist to this day. Through using data mining techniques, we will compare the famous Jack the Ripper letters with writings and other forms of prose from known suspects and see if a prolific killer is among them.

Primary Source Documents

Image source: Casebook (2013), Ripper Letters. https://www.casebook.org/ripper_letters/

Image source: Casebook (2013), Ripper Letters. https://www.casebook.org/ripper_letters/


We began our data collection process by acquiring the texts from the original Jack the Ripper. Then, we put the letters into individual text files. Next, We researched prominent suspects in the Jack the Ripper case. Once we located the suspects, we then acquired writing and quotes by these suspects. Our data set includes writings, testimonies and quotes from 6 different suspects. All suspect primary source documents were broken into individual text files.

Data Preprocessing


The next steps was to get our data into a useable format. This required a few packages in R, “tidytext”, “readtext”, and “tidyverse”. Using “readtext”, text files can be read in and formatted. Then using “tidytext” and the “tidyverse”, we were able to manipulate the data into word frequencies. Once the dataframe was set into a usuable format, we then transformed our data using a min/max transformation.

Exploratory Analysis

Jack the Ripper Word Cloud:


Kmeans Cluster


Decision Tree

                        
prediction               Jack the Ripper
  Carl Feigenbaum                      0
  Joe Barnett                          0
  Lewis Carroll                        0
  Mary Pearcey                         0
  Prince Albert                     1569
  Walter Richard Sickert               0


Support Vector Machine (SVM)


Comparison Analysis


Conclusions

Did we finally solve the mystery of Jack the Ripper?